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1. Introduction 

Pathogenic viruses, in order to survive and successfully reproduce have to fight against 
the immune system of their host organisms. Some viruses use evolvability as a successful 
strategy to escape acquired immunity. In the presence of adaptive response in the host 
newly arising mutants can acquire a competitive advantage with respect to the wild 
type. Neverthless, in some viral populations one often observes "quasispecies" behavior, 
in which individuals are strongly similar to one another. 

A prototypical example of this behavior is exhibited by the Infiuenza A virus, 
which makes use of high antigenical variability (genetic drift) to escape acquired immune 
response. Virus strains circulating in different epidemic seasons, in spite of eliciting a 
certain amount of cross-response by the immune system of the hosts, mutate fast enough 
to be able to infect the same host several times in the course of its life. However in any 
given observed epidemics, the viral population is sufficiently concentrated around a well 
defined strain that effective vaccines can be prepared. Moreover, antigenic changes are 
punctuated: antigenic assays show that the sequences of the dominant circulating type 
H3N2 fall into temporally correlated clusters with similar antigenic properties ([I]). This 
behavior, which has been described as an evolving viral quasispecies, contrasts with the 
prediction of naive models of viral evolution (see, e.g., [2]), where the interaction with 
the immune system leads to proliferation of strains with different antigenic properties 
and, consequently, to the impossibility of preparing effective vaccines. It has recently 
been experimentally shown [3] that some bacteria in a changing environment can develop 
a genotype which produces highly variable phenotypes, presumably adapted to different 
environments. One may speculate whether such a bet-hedging strategy is available to 
viruses, due to the small size of their genetic material and their high mutation rate, 
especially for RNA-based viruses. 

A different example is the one of the in-host evolution of the HIV, which exploits 
high mutation rates to counter the immune adaptation of its host (see, e.g., [HE]). 
Despite its high mutation and substitution rates, the viral population shows low levels 
of differentiation during most of the asymptomatic phase. 

Several solutions have been proposed to this evolutionary puzzle, based on the 
analysis of mathematical models. In [6], elaborating on previous theory ([7J), it was 
proposed that in the case of influenza A, a short-time strain-trascending immunity 
could give account for the quasispecies behavior. It was observed in [S] that this short- 
term immunity would be ineffective in the absence of heterogeneities in infections, due 
either to the structure of network of infective contacts or to variations in the infective 
efficacy of different strains. An alternative explanation was put forward in [9] based on 
the analysis of a model, where the heterogeneity of the neutral phenotypic network of 
the evolving virus is taken into account. An important feature in influenza epidemics 
seems to be the directionality of its propagation. According to the analysis of [10] and 
[TT] yearly epidemics typically start from seeds in South-East Asia before spreading 
worldwide. This suggests the presence of strong bottlenecks in the viral population that 
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could lead to relatively small effective population sizes and to a strong reduction of 
genetic variability. 

Models can suggest mechanisms that can produce the observed evolutionary 
pattern. However, even the simplest individual based model of viral evolution in a 
host population ([H]) is too complicated to lead to a detailed exploration of parameter 
space. It is not clear which conditions on evolutionary and epidemiological parameters 
allow for the evolving quasispecies behavior. In particular we will be interested to the 
dependence on the population size. We wish to ascertain whether in the presence of the 
infectivity reduction due to the immune response of the host, the quasispecies behavior 
can appear for some parameter ranges in the infinite population, or is only possible in 
sufficiently small populations. This last possibility would emphasize the importance of 
population bottlenecks in the ecology of the virus. 

In this paper we introduce a simplified model of evolving viral population that 
keeps into account the interaction between virus and host immunity in an effective way. 
The effectiveness of reproduction of a given viral type at a given time depends on its age 
and its past frequency in the host population. We get a model that is simple enough to 
allow to study at least numerically, the dependence on the different parameters, and in 
particular on the population size. Our findings suggest that quasispecies behavior and 
punctuated equilibria are only possible for small enough populations, whereas, if the 
model parameters are not scaled with the population size, one always has proliferation 
in the large population limit. 

The organization of the paper is the following: in section [2] we define our model 
as an extension of Kingman's house of cards model, where strains are represented by 
self-reproducing entities and the immune memory of the host is taken into account by 
the decrease of the strain fitness with time. In this model, the fitness of a mutant is 
independent of that of its parental strain. This assumption is justified in our approach 
by the fact that since the parental fitness decreases very quickly due to the immune 
adaptation of the host, we do not expect long-lived fitness correlations to exist among 
related strains. In section [3] we recall some known results about the dynamics of the 
model in the absence of memory. In section |4] we study the effect of the immune memory 
in numerical simulations and show the existence of the evolving quasispecies regime. In|5] 
we analyze in more detail the behavior of the system in the evolving quasispecies regime. 
In |6] we introduce a simple effective stochastic process that helps in understanding the 
dynamical behavior of the quasispecies regime. Finally some conclusions are drawn. 

2. The Model 

We consider here a model of an evolving viral population of constant size consisting 
of individuals. The population evolves according to a Wright-Fisher process for 
asexually reproducing populations ([El [13]). At each discrete time step t the population 
reproduces and is completely replaced by its progeny. Each individual is characterized 
by its strain label 5*. The expected offspring size of an individual belonging to strain S 
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is proportional to the Wrightian fitness ws{t) of its strain. Tlie fitness ws{t) of a strain 
5* depends on its intrinsic fitness Wg, proportional to its basic reproductive number, and 
changes with time as described below. In what follows we shall also use the notation 
ws = G^'^ and call fs the Fisher fitness of S. As it is well known (^14]) the Wright-Fisher 
model can be described as a process where each individual j at generation t+1 "chooses" 
its parent i at generation t with probability equal to wsi(t)/ J2j^=i '>^Sjit). Consequently, 
for large A^, the offspring size of an individual z is a random Poissonian variable with 
average ws,{t)/ {w)^. 

At each reproduction event, a mutation can take place with probability ^. Upon 
mutation, a new strain S' appears, and its intrinsic fitness w^, is drawn from a fixed 
probability distribution p{w) ([8l |15l [161 HZ]), independently for each mutation event: 
this corresponds to the infinite allele approximation where no back mutations are 
possible. The fitness ws' of the newly formed strain is equal to its intrinsic fitness 
Wg,. If fitnesses did not depend on time, the model would coincide with Kingman's 
house-of-cards model ([IS]). 

The effect of acquired immunity of the host population on the viral reproduction is 
represented by letting the fitness ws of each strain 5* decrease at each generation with a 
rate proportional to the number Ns(t) = Nns(t) of individuals belonging to that strain 
in the population. For simplicity we consider an exponential decrease of the fitness: 

^5(t+l) = ^5(t)e-'^^^W/^, (1) 

where h is a parameter which determines the rate of fitness decrease. 

In the following we will describe mainly the case in which p{w) is a log-normal 
distribution p{w) oc e~'°s^"'/^"^/w. We have also investigated the model with the 
uniform fitness distribution finding the same qualitative results. 



3. Dynamic behavior in the absence of immunity 

We start by reviewing the behavior of the model for h = 0, i.e., in the absence of 
immune memory of the host ( [T5l fTTl ITS] ) . In this case, in the limit of large populations 
A^ — )■ oo, it is possible to derive an exact equation for the evolution of the fraction Xt{w) 
of individuals with fitness w at time t, namely 

w 

xt+i{w) = —— (l- p)xtiw) + pp{w), (2) 

where {w)^ = f dw wxtiw). The nature of the solution of this equation depends on the 
properties of p{w). If p{w) has a compact support (i.e., p{w) = for w > Wmax), the 
equation admits a stationary state satisfying 

x{w) = , , , , . (3) 

This distribution is analogous to the Bose distribution, with w playing the role of 
the Boltzmann factor. Therefore, if / dw p{w) / {1 — w /wma.^ < oo, the Einstein 
condensation can take place. This transition is known as the error threshold in 
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evolutionary models, and separates a poorly adapted phase at high mutation rate from 
a well adapted phase at low mutation rate. In the well adapted (condensed) phase, 
found for fi < fic, a finite fraction u of the population has the maximum fitness Wmax- 
Here u and /ic are given by 

u=l ; /ic = / dw y . 4 

In this phase the reproduction rate of the individuals with maximal fitness equals 
one: (1 — /i)wmax/ {w) = 1, while all the others have reproduction rates smaller than 
one. Conversely, in the poorly adapted (uncondensed) phase, less than maximally fit 
individuals can reproduce. If the integral J dw p{w)/ (1 — w/wmax) is divergent no error 
threshold appears. 

The situation where the fitness distribution p{w) extends to infinity is more complex 
and interesting. In this case, equation ^ does not admit a stationary solution and one 
obtains instead a run-away. However, in contrast with the case of compact support, the 
dynamics for a finite population with ^ 1 individuals can now be substantially 
different from the infinite population limit. In fact, the evolution process can be 
described as a non stationary record process where condensates of a given fitness are 
replaced by ones of higher fitness ([IH]), with persistence times that becomes longer and 
longer as the fitness increases. It has been noticed in [17] that, however, an uncondensed 
phase can be metastable for long times. Let us denote by ifmut(^) the maximum fitness 
appearing among the mutants in t generations. As at each time step a number Nfi of 
new mutants appear, the maximum fitness Wmut(l) for one generation satisfies 



oo 



Nfi / dw p{w) ~ 1. (5) 

If the reproduction rate (1 — fi)w^ut{'^) / {w) is smaller than one, the uncondensed phase 
will be stable. This quantity depends on t via the average (w). Then the stability 
time can be determined by the condition that the maximal fitness of mutants over an 
interval of t generations, as determined by the relation Npt ^^^^ dw p{w) ~ 1 has a 
reproduction rate equal to one. In this way one gets an effective A^- and t- dependent 
error threshold, which can be approximately described by equations 

p /—(I) p(^) 
V = I ; Ur. = I dw -, — 



, Pc = dw -. -— . (6) 

Pc Jo l-w/w^ut{t) 

In figure [T] we compare this analysis with numerical simulations for the lognormal 

distribution with parameter a = 0.3, showing the fraction u of the condensate for two 

independent populations of A^ = 1000 individuals evolving for t = 5000 generations, 

together with the predictions of equation (|6|) as a function of the mutation rate. 



4. Introducing the fitness decay 



Let us now study the effect of the fitness decay and set h > 0. We continue to dwell 
mainly on the case of lognormal fitness distribution p{w) with parameter a = 0.3. In 
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Figure 1. Condensate fraction u as a function of the mutation rate for two 
independent populations of 1000 individuals evolving for t = 5000 generations, together 
with the theoretical result. The data display a well defined error threshold in agreement 
with the theory. 

order to study the possibility of a non zero condensate we observe that while for h = 
the maximally occupied genome should usually coincide with the one of highest fitness, 
this is not necessarily true if h > 0. Let us define therefore, at any given time, the 
leader strain in the population as the one that is populated by the largest fraction 
of individuals, and the max as the one whose fitness is largest. In figure [2] we plot 
the occupation fraction u of the leader as a function of the mutation rate, for several 
values of h, in a population of size = 1000. The figure shows that while the leader 
occupation fraction is a decreasing function of h, the error threshold is not destroyed 
by the immunity response and the critical value fic is roughly independent of h. We 
also observe that a non zero h introduces a natural time scale into the system, and 
one should expect that stationarity is reached after a finite relaxation time, so that the 
critical value of fi becomes time independent. In figure |3] we compare the fitness of the 
leader as a function of time in the condensed phase for h = to the one corresponding 
to a small value of h. One can manifestly see that while stationarity does not hold in 
the former case, it holds in the latter. In figure |4] we plot the occupation of the leader 
as a function of the mutation rate for various values of A^. One can see that the error 
threshold slowly moves to higher values of jj; as N increases. This also appears in the 
behavior of the leader fitness wiead and of the maximal fitness w^ax in the system which 
exhibit a maximum at the error threshold. We plot these quantities together with the 
average fitness in figure |5} In order to obtain a less cluttered plot, the fitnesses are 

rescaled by the factor f{N) = exp (^20^ \og{N / \f2'na^)^ , where a = 0.3 is the width 

of the distribution of logwo, expected on the basis of extreme- value statistics for the 
lognormal distribution. The range of variation of this factor is too small to warrant 
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Occupation of MOS vs mu (N=1000) 




Figure 2. Occupation fraction v of the leader for N — 1000, a — 0.3 and several 
values of h, as a function of /i. The error threshold takes place at an /i-independent 
value of fi if h is not too large. In the inset we plot vh^^^ exhibiting the proportionality 
of V to h^^^^ in the condensed phase. 
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Figure 3. Fitness wioad of the leader as a function of time for the lognormal fitness 
distribution with a = 0.3, /i = 0.1, and for h = Q and h = 0.01 respectively. For h = 
the fitness follows a record process increasing on average with time. As soon as /i > 
the process becomes stationary. 

drawing any conclusion from this observation. Notice that while the leader and the 
maximal fitnesses reach the maximum at the error threshold, the average fitness has a 
maximum at a lower value of fi. Close to the maximum of the average fitness the leader 
and the max occupation fractions turn out to be roughly independent of A^, as it can 
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Figure 4. Occupation ly of the leader for h — 0.01 as a function of for TV = 
1000, 2000, 4000, 10000. The error threshold slowly moves to higher values of as 
increases. 



1.2 




Figure 5. Maximal Wmax (upper line), leader wioad (middle line) and average {w) 
(lower line) fitness for h = 0.01, as a function of ^, for N = 1000,2000,4000,10000. 

The fitnesses are rescaled by the factor f{N) ~ exp 2a'^ \og{N / ^/2TTa'^)J , where 

a — 0.3 is the width of the distribution of logwo, expected on the basis of extreme- 
value statistics. The error threshold is slowly displaced to higher values of /x as 
increases. 



be seen from figure |6| which plots these two quantities as a function of h. Conversely, 
figure It] shows that the fitness of the leader and of the max are functions of h/N: for 
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h 



Figure 6. Occupation of leader (upper) and max (lower) as a function of for /i = 0.1 
and various values of iV = 1000, 2000, 4000, 8000. 




Figure 7. Average (main panel) and leader (inset) fitness as a function of h/N for 
/Lt = 0.1 and various values of iV = 1000, 2000, 4000, 8000. 
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5. Dynamics 



The results of the previous section clearly show that the error threshold persists even in 
the presence of a small amount of immune response in the host. In order to gain a better 
understanding of the evolving quasispecies regime characterizing the well adapted phase, 
let us look at the temporal behavior of the system. In figure [8] we show the occupation 
fractions of the max and of the leader as a function of time, in the well adapted phase 
with fi = 0.1 and h = 0.01 in a population of = 1000 individuals. We can see that 
both quantities exhibit a regular behavior in time, which can be qualitatively described 
as follows. When a new fitter mutant establishes in the population, it substitutes the 
previous leader and remains leader for a while before being at his turn substituted. 
We find therefore a kind of punctuated equilibrium dynamics, as announced in the 
introduction. Figure [9] shows the corresponding behavior of the average fitness of the 
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Figure 8. Time dependence of the occupation fraction of the max and of the leader in 
the evolving quasispecies regime. Upper panel: the system is far from the threshold, 
fluctuations are small, the substitution time is the only random quantity, N = 40.000, 
/i — 0.05, h — 0.01. Lower panel: the system is close to the error threshold, fluctuations 
are large, N = 1000, pi = 0.1, h = 0.01. 



leader and of the max respectively. One sees that the fitness of the max stays constant 
for a while after the emergence of the mutant, and then, when the max replaces the 
leader, it rapidly decreases until a new leader takes over. It is interesting to look at the 
reproduction rate of the leader strain. As shown in figure 10, this is observed to differ 
substantially from one only close to substitution events. Just before substitution, the 
challenged leader has a reproduction rate which is smaller than one, while it is larger 
then one just after substitution. In the periods when the leader largely dominates the 
population (dominance periods) the reproduction ratio is very close to one, and diversity 
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Figure 9. Behavior of the fitness in the quasispecies regime. Green: max; red: leader. 
For both panels the parameters are the same as in the corresponding ones of figure |8] 



in the population is maintained by mutants. We measured the average duration tgubs of 
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Figure 10. Reproduction rate of the max and of the leader, for parameters 
corresponding to the lower panels of [8] and [9] Notice that the reproduction rate of 
the max is always larger than 1, while the reproduction rate of the leader is noticeably 
different from one only close to a substitution event. 



the leadership, i.e., the time interval between successive substitutions. In figure 11 we 

plot (tsubs ~ l)h^^'^ vs. h, showing that this quantity it is roughly proportional to 

and essentially independent of fi, in the quasispecies regime. Given the natural decay 
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time of fitness on time scales of order 1/h it would have been natural to hypothesize a 
substitution time of the same order. Our numerical result show that substitutions being 
separated by times of order h~^^'^ are much more frequent. 
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Figure 11. Plot of (tsubs ~ as a function of h for fi = 

0.05,0.1,0.15,0.2,0.25,0.3,0.35. The curves are roughly constant in the quasispecies 
regime. 



6. The Diluted Champion Process 

Certain aspects of the evolving quasispecies dynamics at low h and fi can be interpreted 
by introducing an effective stochastic process for the substitution. We start by defining 
the Champion Process where at each time there is a leader. The performance of 
the leader, which is represented by a fitness value, declines exponentially in time: 
Wt+i = XWt with A < 1. The leader is challenged at regular intervals of time, and 
the challenger's fitness is extracted randomly from a distribution p*{W). If the fitness 
of the challenger exceeds the one of the leader in charge, the challenger substitutes 
the old leader. Otherwise there is no substitution and the old champion remains in 
charge. The Diluted Champion Process (DCP) is a simple variation of the process 
where the substitution process is not deterministic. Substitutions can occur with a 
certain probability that depends on the fitness ratio between the challenger and the 
champion in charge. This model can obviously be adapted to competition situations 
where the performances of the individuals naturally decline in time. 

Now, in the evolving quasispecies regime of our model, while the substitution times 
and the leaders fitnesses are random quantities, the dynamics between punctuations is 
essentially deterministic. Moreover, the champion substitution process requires times 
that are much shorter than the typical times of change of the fitness. Indeed as figures |8] 
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and |9] show, the fitness of the champion starts to decay as soon as the new challenger 
takes over. These observations make the DCP description pertinent for our model. 
During the dominance periods, at each time step the champion is challenged by a 
number of mutants M Poisson distributed with (M) = Nfi, with random fitnesses 
drawn from p{w). By elementary extreme values statistics, the best fit challenger has 
therefore a fitness distribution given by p*{w) = pN p{w) e"'^^^™ p^'^'\ This will have 
a fixation probability f {w / w\ea,Aer) ■, which can be estimated by the Kimura expression: 
f{x) = (1 — — x^^). Thus the probability of survival in one round of a leader 

with fitness wieader is given by 

$(wieader) = 1 - [ dw p* (w) f ( ■ (7) 
J V ^leader/ 

At this point we will assume, consistently with the simulations, that for small h we can 
neglect the fixation time with respect to the leader strain lifetime, and suppose that for 
a champion Wt+i = Xwt, with a constant A equal to A = exp(— n*/i) where n* is the 
condensate fraction. In the case of the DCP, one can safely assume n* ~ 1, leading to 

A = e-^ 

It is possible to formally write the basic formula for the probability that a champion 
with initial fitness wq is substituted after exactly t + 1 time steps by a new champion 
with fitness Wi, namely: 

PtK I wo) = n $KA^)P*K)/ . (8) 

From this equation all the the properties of the DCP can be in principle derived. One can 
express the conditional probability that the leader fitness equals wi given the previous 
leader fitness Wq by 

oo 

M{wi\wo) = Y,Pt{wi\wo). (9) 

Then one can in principle obtain the stationary distribution of the leader fitness p{w) 
from 



p{wi) = J dwo M{wi I Wo)p{wo), (10) 
and the distribution of the substitution time at stationarity from 

q{t) = J dwo dwi Pt{wi \ Wq)p{wq). (11) 
Unfortunately these equations are not easily analyzed, so in order to test the validity 



of this simple effective model we had to simulate it. In figure 12 we show the typical 
evolution of the leader's fitness as a function of time. The DCP clearly reproduces the 
qualitative features of our model. 



7. Conclusions and perspectives 



We have introduced a simple effective model describing the evolution of a population 
of pathogens in the presence of the immunological response of the host. The model is 
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Figure 12. Leader's fitness as a function of time for tlie main model and for the DCP 
with N = 400, fi = 0.05, h = 0.01 and a = 0.3. The DCP clearly reproduces the 
qualitative features of the main model. 

simple enough to be amenable to a systematic numerical investigation, and allows one 
to identify a parameter region in which the coexistence of a well-defined quasispecies 
and of a fast turnover of the dominant strain is be quite robust. The behavior of the 
model in this region can be understood in terms of a further simplified model: the 
Diluted Champion Process. Our model can be interpreted in several ways: on the one 
hand, as describing the evolving viral strains present in the host population, e.g., in 
the case of the annual influenza epidemics; on the other hand, as the evolution of the 
viral population in a chronic infection of a single host individual, as in intra-host HIV 
evolution. In this case, fluctuations in the strength of immune response could cause 
proliferation of the number of strains and then the failure of the immunity system to 
downregulate all the viral strains (see, e.g., [IS]). It is possible to obtain a more realistic 
description of this regime by, e.g., relaxing the fixed population constraint. It is also 
possible to consider a higher degree of correlation in the fitness landscape, in order to 
understand the extent of the stability of the punctuated equilibrium regime. 

We are working on a generalization of our model taking into account the existence 
of different regions. They can be understood as different climatic regions in the case 
of epidemics, or localization in different organs in the case of intra-host infection. This 
investigation will allow us to understand better the different roles played by different 
regions: seeder, reservoirs, etc., as observed in recent analyses of epidemiological data 
collected in the last years ([lOl [H]), or to identify possible gene-surfing phenomena in 
the diffusion of new strains from external regions (|20]). 
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